Principal components analysis is an ordination method allowing us to glean as much about our multivariate data as possible in a simplified number of dimensions.
Here, we’ll use the penguins data within the palmerpenguins R package to explore variable relationships and clustering by species in a PCA biplot. For this example with PCA, we will only use the structure size measurements (bill length and depth, mass, and flipper length).
penguin_pca <- penguins %>%
dplyr::select(body_mass_g, ends_with("_mm")) %>%
tidyr::drop_na() %>%
scale() %>%
prcomp()
penguin_complete <- penguins %>%
drop_na(body_mass_g, ends_with("_mm"))
autoplot(penguin_pca,
data = penguin_complete,
loadings = TRUE,
colour = 'species',
loadings.label = TRUE,
loadings.colour = "black",
loadings.label.colour = "black",
loadings.label.vjust = -0.5
) +
scale_color_manual(values = c("blue","purple","orange")) +
scale_fill_manual(values = c("blue","purple","orange")) +
theme_minimal()
# It's not perfect, but it's enough for now...
ggplot customization & reading in different file typesWe spent some time in ESM 206 customizing our data visualizations. Let’s add some more tools, including: - Highlight spaghetti plots with gghighlight - An interactive graph with plotly - A universe of color palettes in paletteer
Here, we’ll also read in stored .txt and .xlsx, and files from a URL to build our toolkit for how to read in data.
Data: NOAA Foreign Fisheries Trade Data
fish_noaa <- read_excel(here("data","foss_landings.xlsx")) %>%
clean_names() %>%
mutate(across(where(is.character), tolower)) %>% # convert all characters to lowercase
mutate(nmfs_name = str_sub(nmfs_name, end = -4)) %>% # remove last 3 characters
filter(confidentiality == "public")
Now, let’s make and customize a graph:
fish_plot <- ggplot(data = fish_noaa, aes(x = year, y = pounds, group = nmfs_name)) +
geom_line(aes(color = nmfs_name)) +
theme_minimal()
# Make it interactive:
ggplotly(fish_plot)
# Highlight series based on condition(s):
ggplot(data = fish_noaa, aes(x = year, y = pounds, group = nmfs_name)) +
geom_line() +
gghighlight(nmfs_name == "tunas") + # Highlight just tunas
theme_minimal()
ggplot(data = fish_noaa, aes(x = year, y = pounds, group = nmfs_name)) +
geom_line(aes(color = nmfs_name)) +
gghighlight(max(pounds) > 1e8) + # Highlight just tunas
theme_minimal()
lubridate() refresher, and introducing paletteer for color palettesView(palettes_d_names)View(palettes_c_names)Data: Monroe Water Treatment Plant Daily Electricity Use
Accessed from data.gov
Summary: “Daily energy use (kWh), demand (kW), and volume water treated (million gallons). 2010 through current. A second electric meter and account were added at the plant in March 2013. The usage and demand data from this meter are labeled as”Energy Use 2" and “Peak 2.”
The URL to the CSV file is provided at the website above (or copy from below):
monroe_wt <- read_csv("https://data.bloomington.in.gov/dataset/2c81cfe3-62c2-46ed-8fcf-83c1880301d1/resource/13c8f7aa-af51-4008-80a9-56415c7c931e/download/mwtpdailyelectricitybclear.csv") %>%
clean_names()
monroe_ts <- monroe_wt %>%
mutate(date = mdy(date)) %>% # Convert date to a stored date class
mutate(record_month = month(date)) %>% # Add column w/ month number
mutate(month_name = month.abb[record_month]) %>% # Add column w/ month abbreviation
mutate(month_name = fct_reorder(month_name, record_month)) # Make month name a factor & reorder based on values in record_month column
ggplot(data = monroe_ts, aes(x = month_name, y = total_k_wh)) +
geom_jitter(aes(color = month_name),
show.legend = FALSE,
alpha = 0.5,
size = 0.3,
width = 0.2) +
scale_color_paletteer_d("palettetown::delibird")
patchwork for compound figuresLet’s make two quick graphs, & store them. Then combine using patchwork. See more information about the patchwork package HERE.
graph_a <- ggplot(data = penguins, aes(x = body_mass_g, y = flipper_length_mm)) +
geom_point(aes(size = bill_length_mm, color = bill_depth_mm), show.legend = FALSE) +
scale_color_paletteer_c("grDevices::RdBu")
graph_b <- ggplot(data = penguins, aes(x = species, y = flipper_length_mm)) +
geom_jitter(aes(color = flipper_length_mm), show.legend = FALSE)
# Use | to put graphs side by side, and / to put one over the other.
graph_a | graph_b
# Store the output, apply updates across all graphs with &
graph_c <- graph_a / graph_b & theme_minimal()
# Export as a .png with ggsave:
ggsave(here("fig","graph_c.png"), width = 5, height = 5)
# Get even wilder (not a real example):
graph_c | graph_a
# Or something like:
(graph_a | graph_b | graph_a) / (graph_b | graph_a) & theme_dark()